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DETAILED ACTION 
Continued Examination Under 37 CFR 1.114 

A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1 .17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1 .17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 
3/8/2007 has been entered. Claims 1,7,9,15,17,18,35,52 and 53 have been amended. 
Accordingly, claims 1-31, 35-44, 46-48 52, and 53 are pending in this office action. 

Response to Arguments 

Applicant's arguments with respect to claims 1-31, 35-44, 46-48, 52 and 53 have 
been considered but are moot in view of the new ground(s) of rejection. 

Claim Rejections ■ 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 

USPQ 459 (1966), that are applied for establishing a background for determining 

obviousness under 35 U.S.C. 103(a) are summarized as follows: 
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1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

Claims 1-23,28,29, 35-40, 52,53 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over Wo 03060766 (hereinafter Lind) (art of record) in view of US 

6560597 (hereinafter Dhil). 

As for claim 1 Lind discloses: a scoring module determining a score which is 
assigned to at least one concept that has been extracted from a plurality of 
electronically-stored documents (See page 17 lines 20-24 note: definition of document 
corpus) wherein the score is based on at least one of a frequency of occurrence of the 
at least one concept within at least one such document, a concept weight, a structural 
weight, and a corpus weight; (See page 7 lines 20-24) a clustering module forming 
clusters of the documents by evaluating the score for the at least one concept of each 
document for a best to the clusters and assigning each document to the cluster with the 
best fit; and (See page 19 lines 4-10). While Lind does not differ substantially from the 
claimed invention the disclosure of a threshold module determining similarities between 
the documents grouped into each cluster based on the center of the cluster and the 
scores assigned to each of the at least one concepts in each document dynamically 
determining a threshold for each cluster as a function of the similarities, and identifying 
and reassigning those documents having the similarities falling outside the threshold are 
not necessarily explicit. Dhill however does disclose a threshold module determining 
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similarities between the documents grouped into each cluster based on the center of the 
cluster and the scores assigned to each of the at least one concepts in each document 
dynamically determining a threshold for each cluster as a function of the similarities 
(See column 3 lines 55-60 and column 5 line 55- column 6 line 5); and identifying and 
reassigning those documents having the similarities falling outside the threshold (See 
column 3 lines 60-65) .It would have been obvious to an artisan of ordinary skill in the 
pertinent at the time the invention was made to have incorporated the teaching of Dhil 
into the system of Lind.The modification would have been obvious because the two 
references are concerned with the solution to problem of efficient document scoring and 
clustering, therefore there is an implicit motivation to combine these references. In other 
words, the ordinary skilled artisan, during his/her quest for a solution to the cited 
problem, would look to the cited references at the time the invention was made. 
Consequently, the ordinary skilled artisan, would have been motivated to combine the 
cited references since Dhil's teaching would enable Lind's users to reclassify 
documents based on the center of the cluster.. 



As for claim 2 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the score as a function of a summation of at least one of 
the frequency of occurrence, the concept weight, the structural weight, and the corpus 
weight of the at least one concept (See Page 23 lines 1-4). 



Application/Control Number: 10/626,984 Page 5 

Art Unit: 2166 

As for claim 3 the rejection of claim 2 is incorporated, and further Lind discloses: 
a compression module compressing the score through logarithmic compression (See 
page 17 line 30-34). 

As for claim 4 the rejection of claim lis incorporated, and further Lind discloses: 
the scoring module calculating the concept weight as a function of a number of terms 
comprising the at least one concept (See page 21 lines 25-28). 

As for claim 5 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the structural weight as a function of a location of the at 
least one concept within the at least one such document (See page 18 lines 10-14). 

As for claim 6 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the corpus weight as a function of a reference count of 
the at least one concept over the plurality of documents (See page 18 lines 19- 21 note: 
this is an inverse weight of the reference count). 

As for claim 7 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module forming the score assigned to the at least one concept to a 
normalized score vector for each such document, determining each such similarity 
between the normalized score vector for each such document as an inner product of 



Application/Control Number: 10/626,984 Page 6 

Art Unit: 2166 

each normalized score vector, and applying the similarity to the best fit criterion (See 
page 30 line 30- page 31 line 1). 

As for claim 8 the rejection of claim 1 is incorporated, and further Lind discloses: 
the clustering module evaluating a set of candidate seed documents selected from the 
plurality of documents, identifying a set of seed documents by applying the score for the 
at least one concept to a best fit criterion for each such candidate seed document, and 
basing the best fit criterion on the score of each such seed document (See page 28 line 
8-16 note representative= seed). 

Claims 9-16 are method claims corresponding to system claims 1-8 respectively, 
and are thus rejected for the reasons set forth in the rejection of claims 1-8. 

Claim 17 is rejected for the same reasons as claim 9. 

As for claim 18 Lind discloses: a scoring module scoring a document in an 
electronically-stored document set comprising: a frequency module determining a 
frequency of occurrence of at least one concept within a document (See page 18 lines 
1-3); and a concept weight module analyzing a concept weight reflecting a specificity of 
meaning for the at least one concept within the document (See page 25 lines 27-30 
note: rtc(t.c) is a value based on meaning); a structural weight module analyzing a 
structural weight reflecting a degree of significance based on structural location within 
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the document for the at least one concept (See page 18 lines 8-13), a corpus weight 
module analyzing a corpus weight inversely weighing a reference count of occurrences 
for the at least one concept within the document (See page 18 lines 19-21 note: this is 
an inverse weight of the reference count); and a scoring evaluation module evaluating a 
score to be associated with the at least one concept as a function of the frequency, 
concept weight, structural weight, and corpus weight; (See page 21 24-27) While Lind 
does not differ substantially from the claimed invention the disclosure of a threshold 
module relocating outlier documents, comprising determining similarities between the 
documents groups into each cluster based on the center of the cluster and the scores 
assigned to each of the at least one concepts in each such document , dynamically 
determining a threshold for each cluster as a function oh the similarities, and identifying 
and reassigning the documents with the similarities falling outside the threshold are not 
necessarily explicit. Dhill however does disclose a threshold module determining 
similarities between the documents grouped into each cluster based on the center of the 
cluster and the scores assigned to each of the at least one concepts in each document 
dynamically determining a threshold for each cluster as a function of the similarities 
(See column 3 lines 55-60 and column 5 line 55- column 6 line 5); and identifying and 
reassigning those documents having the similarities falling outside the threshold (See 
column 3 lines 60-65). It would have been obvious to an artisan of ordinary skill in the 
pertinent at the time the invention was made to have incorporated the teaching of Dhil 
into the system of Lind.The modification would have been obvious because the two 
references are concerned with the solution to problem of efficient document scoring and 
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clustering, therefore there is an implicit motivation to combine these references. In other 
words, the ordinary skilled artisan, during his/her quest for a solution to the cited 
problem, would look to the cited references at the time the invention was made. 
Consequently, the ordinary skilled artisan, would have been motivated to combine the 
cited references since Dhil's teaching would enable Lind's users to reclassify 
documents based on the center of the cluster.. 

As for claim 19 the rejection of claim 18 is incorporated and further Lind 
discloses: the scoring module evaluating the scoire in accordance with the formula Si = 
£ fij x cwij x swij x rwij where si comprises the score, fij comprises the frequency, 
0<cwij <1 comprises the concept weight, o <swij <1 comprises the structural weight, 
and 0 < rwij < 1 comprises the corpus weight for occurrence j of concept I (See page 23 
lines 1-4). 

As for claim 20, the rejection of claim 19 is incorporated and further Lindh 
discloses: the concept weight module evaluating the concept weight in accordance with 
the formula: 
Cwij = 0.25 + (0.25 x tij), 1<tij<3 
0.25 + (0.25 x [7-tij]) 4<tij<6 
0.25, tij> 7 (Seepage 17 lines 30-34) 
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As for claim 21 , the rejection of claim 19 is incorporated, and further Lindh 
discloses: the structural weight module evaluating the structural weight in accordance 
with the formula: 

Swij= 1.0, if (J * SUBJECT) 

.8, if (J~ HEADING) 

.7, if (J* SUMMARY) 

.5, if(J ~ BODY) 

.1, if (J* SIGNATURE) 
where swij comprises the structural weight for occurrence j of each such concept I (See 
page 21 lines 25-29). 

As for claim 22, the rejection of claim 19 is incorporated, and further Lindh 
discloses: the corpus weight module evaluating the corpus weight in accordance with 
the formula: 



Rwij = (T-rii ) A 2 , rij >M 
T 

1 .0 rij < M 

Where rwij comprises the corpus weight rij comprises a reference count for occurrence j 
of each such concept I, T comprises a total number of reference counts of documents in 
the document set, and M comprises a maximum reference count of documents in the 
document set (See page 23 lines 20-23). 
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As for claim 23, the rejection of claim 19 is incorporated and further Lindh 
discloses: a compression module compressing the score in accordance with the formula 
SI = log(Si +1), where Si comprises the compressed score for each such concept I 
(See page 27 lines 1-7). 

As for claim 28 the rejection of claim 18 is incorporated, and further Dhill 
discloses a plurality of candidate seed documents (See column 5 lines 35-42), a 
similarity module determining a similarity between each pair of a candidate seed 
document and a cluster center (See column 5 lines 55-60); a clustering module 
designating each such candidate seed document separated from substantially all cluster 
centers with such similarity being sufficiently distinct as a seed document, and grouping 
each such candidate seed document not being sufficiently distinct into a cluster with a 
nearest cluster center(See column 3 lines 60-65). It would have been obvious to an 
artisan of ordinary skill in the pertinent at the time the invention was made to have 
incorporated the teaching of Dhil into the system of Lind.The modification would have 
been obvious because the two references are concerned with the solution to problem of 
efficient document scoring and clustering, therefore there is an implicit motivation to 
combine these references. In other words, the ordinary skilled artisan, during his/her 
quest for a solution to the cited problem, would look to the cited references at the time 
the invention was made. Consequently, the ordinary skilled artisan, would have been 
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motivated to combine the cited references since Dhil's teaching would enable Lind's 
users to reclassify documents based on the center of the cluster.. 



As for claim 29 the rejection of claim 28 is incorporated, and further Dhill 
discloses: a plurality of candidate seed documents; a similarity module determining a 
similarity between each pair of a candidate seed document and a cluster center; a 
clustering module designating each such candidate seed document separated from 
substantially all cluster centers with such similarity being sufficiently distinct as a seed 
document, and grouping each such candidate set document not being sufficiently 
distinct into a cluster with a nearest cluster center(See column 6 lines 25-40). 

Claims 35-40 are method claims comprising substantially the same limitation as 
system claims 18-23, and are thus rejected for the reasons set forth in the rejection of 
claims 18-23. 

Claims 46 is a method claims corresponding to system claim 29 and is thus 
rejected for the same reason as set forth in the rejection of claim 29. 

Claim 52 is rejected for substantially the same reasons as claim 35,. 
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Claim 53 is an apparatus claim corresponding to method claim 18 and is thus 
rejected for the same reasons as claim 18. 



Claims 24-27 and 41-44 and claims is rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lind and Dhill as applied to claim 18 and 35 above, and further in 
view of US 6675159 (hereinafter Klein) (art of record) 

As for claim 24 the rejection of claim 18 is incorporated, and further Klein 
discloses: a global stop concept vector cache maintaining concepts and terms (See 
column 18 lines 17-20 and See column 14 lines 45-49); and a filtering module filtering 
selection of the at least one concept based on the concepts and terms maintained in the 
global stop concept vector cache (See column 14 lines 45-50). It would have been 
obvious to an artisan of ordinary skill in the pertinent art at the time of the invention to 
have incorporated the teachings of Klein into the system of Lind. The modification would 
have been obvious because queries and documents are linked in the fact that words are 
the entities that are being processed. Therefore, any transformation capable of being 
made to a query should be able to applied to documents too, this makes all document 
management systems more efficient and easier to maintain. 

As for claim 25 the rejection of claim 1 8 is incorporated, and further Klein 
discloses: a parsing module identifying terms within at least one document in the 
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document set, and combining the identified terms into one or more of the concepts (See 
column 2 lines 53-56). 

As for claim 26 the rejection of claim 25 is incorporated, and further Klein 
discloses: the parsing module structuring each such identified term in the one or more 
concepts into canonical concepts comprising at least one of word root, character case, 
and word ordering (See column 14 lines 63-67). 

As for claim 27 the rejection of claim 25 is incorporated, and further Klein 
discloses wherein at least one of nouns, proper nouns and adjectives are included as 

Claims 41-44, are method claims corresponding to system claims 24-27, 
respectively and are thus rejected for the same reasons as set forth in the rejection of 
claims 24-27,. 

Claims 30,31,47 and 48 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lind and Dhill as applied to claim 29 above, and further in view of 
Klein. 

As for claim 30 the rejection of claim 29 is incorporated, and further Klein 
discloses: a normalized score vector for each document comprising the score 



Application/Control Number: 10/626,984 Page 14 

Art Unit: 2166 

associated with the at least one concept for each such concept occurring within the 
document (See column 3 lines 18-21); and the similarity module determining the 
similarity as a function of the normalized score vector associated with the at least one 
concept for each such document (See column 18 lines 23-26). 

As for claim 31, the rejection of claim 30 is incorporated, and further Klein 

discloses: the similarity module calculating the similarity in accordance with the formula 

coso ab = (Ss * Sb) 
Sa Sb 

Where coso ab comprises a similarity between a document A and a document B, Sa 
comprises a score vector for document A and Sa comprises a score vector for 
document B. 

Claims 47 and 48 are method claims corresponding to the system of claims 30 
and 31 respectively and are thus rejected for the same reasons as set forth in the 
rejection of claim 30. 



Application/Control Number: 10/626,984 
Art Unit: 2166 



Page 15 



Information 



Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leon J. Harper whose telephone number is 571-272- 
0759. The examiner can normally be reached on 7:30AM - 4:00Pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Hosain T. Alam can be reached on 571-272-3978. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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Leon J. Harper 
May 25, 2007 




Primary Examiner 



